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GATCTTTGGA TTGGAAACAG TTAAAGAACA ATATGTGCAT GATGATGATT 
60 70 80 90 100 



_L_ 



TTAAAG ATGT GTTTTTGCAT TG7AAGGATG GGAAGGCATG GAATAAATTT 

110 120 130 140 150 

GTTGTAAATG ATGGTTTTGT GT7TAGAGCT AATAAGCTAT GCATTCCAGC 

160 170 180 190 200 



TACCTCTGTT CGTTTGTTGT TGCTACAGGA AGCACATGGA GGTGGTTTGA 

210 220 230 240 250 

TGGGAC ATT T TGGGGCAAAG AAOACGGAGG ACATACTGGC TGCTCATTTC 

2|0 ^ 3 9° 

n-TTGGCCAA AGATGAGGAG AGATG1GGAG AGA7TTATTG CTCGCTGCAC 



JlO 



330 



340 



23? 



GACATGTCAA AAGGCCAAGT CACGCTTAAA TCCACACGAT TTGAAGCCAT 

360 270 380 390 4^0 

ATTTGGGTGA GGGAGATGAG CT7GAGTCGG GGAGGACTCA AATGCAAGAA 



420 



430 



440 



4S0 



GGGGAGGATG ATGAGGACAT CAGCACCATC TATACATCCA CACCTACACC 

460 470 480 490 500 
S I I I 1 — 

CACACCATCG CCAACACCAC 1TGGCCCTCT TACTCGTGCC AGTGCCCG1C 



510 520 S30 540 550 

AACTGAACCA TCAAGTAAGT TTATTCTTAA ACTCTTCTCC ATCATATTTA 



in 

QC 



O 



GACAATGGAG ACACGTGCAC TCTTGTTTTG CTTAGGAATG ATGGAGAGGA 
610 620 630 6jM 650 



CCAGAAGCAT AGGGGATTGG TGrAGGCTCG ATTTGGACAG CAAGACAGCA 

660 670 680 690 700 

i i i i i 

CCAACTTACA ACAACCCCCA TCACTTCATA CACAGTCCAT TTTAAGCATG 

710 720 730 74D 750 

I 1 I — ____ 1 i_ 

CAAGCACTTG ATOCAAAACT C5TCAAGTAT ATTTTTAGAT GGATC 
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DNA SEQUENCES SPECIFIC TO RICE CENTROMERES 

This invention was made with United States government support awarded by the 
following agencies: USDA HATCH 3935. The United States has certain rights in this invention. 

Field of the Invention 

The present invention relates generally to molecular biology. In particular, the present 
invention relates to nucleic acid sequences which encode a centromere from rice. 

Background of the Invention 

Among the most distinguishing and characteristic landmarks of chromosomes of higher 
eukaryotes is the location of the centromere. The centromeric region is the site for mitoitic and 
meiotic spindle fiber attachment and is responsible for sister chromatid association. Jiang, J. } et 
al. 3 Proc. Natl Acad ScL USA, 95: 8135-8140 (1998). Therefore, centromeres play a central role 
in the process of chromosomal segregation and transmission in cell divisions. Id. The molecular 
organization of centromeres has been studied extensively in yeast, Drosophila melanogaster, and 
humans. 

Centromeric regions usually consist of heterochromation and are thought to be highly 
methylated. Miller, J.T., et aL. Theor. Appl. Genet , 96:832-839 (1 998). In addition, centromeres 
show varying amounts of nontranscribed repetitive sequences, which are referred to as satellite 
DNAs. Haaf, T., et aL, Cell, 70:681-695 (1992). The predominant class of centromeric DNA, is 
the alpha-satellite DNA, which is found in diverged form in all centromeres. Id. To the extent 
that it is known, alpha-satellite arrays appear to be uninterrupted by other (nonsatellite) DNA 
sequences. 

In centromeres, naturally occurring satellite arrays range in size from several hundred kbs 
to several megabases in length. Recent studies, however, suggest that as little as 140 kb of alpha 
satellite DNA may be sufficient to confer centromere function in human cells. Harrington. John, 
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J., et al.. Nature Genetics, 15:345-355 (1997). Unfortunately, satellite DNA of this size has 
proven difficult to clone and propagate stably in microorganisms using conventional cloning 
vectors. Id. In large part, the difficulty in propagating satellite DNA stems from the tendencv of 
tandemly repetitive DNA to recombine into smaller arrays and this effect increases with the size 
of the repetitive array. Id. 

As briefly mentioned hereinbefore, factional centromeric sequences have been isolated 
and purified from 5. cerevisiae (see Clark et al.. Nature, 287:504-509 (1980) and Stinchcomb et 
al., J. Molec. Biol. 158:157-179 (1982)). Episomes carrying the yeast centrcmeric sequences 
d.s P lay proper segregation into daughter yeast cells during mitosis and meiosis, in contrast to 
autonomous replication sequences plasmids lacking a centromere. 

The best characterized centromeric DNAs originated from the budding yeast S. cerevisiae 
(See Clarke and Carbon, ,4™. Rev. Gene,. 19:29-56 (1985)). The DNA region required for 
centromere function in S. cerevisiae is approximately 1 20 base pair (hereinafter "bp") long and is 
composed of three conserved domains: CDEI, an 8 bp element (A/G)TCAC(A/G)TG) CDEII an 
extremely (about90%) AT-rich region of approximately 80 bp, and CDEIII, a 25 bp element ' 
(TGTTT(A/T)TGNTTTCCGAAANNNN AA A) The molecular structure of centromeric DNAs 
from the fission yeast Schizosaccharomyces pombe have also been characterized. Several classes 
of £ pombe moderately repeated DNA elements have been identified which are found only in the 
centromere regions. These centromere-specific repetitive elements have been designated dg (3 8 
kb), dh (4 kb), and yn by Yanagida and co-workers (Nakaseko et al., Embo. J. 5.:101 1-1021 
(1986); Nakaseko et al. A* Acid Res. 15:4705-47.5 (1987)), and K (6.4 kb), L (6 kb), and B (1 
kb) by Carbon and his colleagues (Clarke et al., PNAS 83:8253-8257 (1986): Fishel et al Mol 
Cell Biol. 8:754-763 (1988)). The dg element has an AT-rich region and a 600 bp domain 
containing numerous small direct repeat motifs. Similarly, the dh element has an overall AT 
content approaching 70% and contains many short direct repeats. No nucleotide similarities to 
the S. cerevisiae CDEs have been found in the S. pombe elements. 
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Attempts to demonstrate that the S. pombe centromere-specific repetitive elements can 
function individually as centromeres have been unsuccessful However, large restriction 
fragments (65 to 150 kb) carrying the entire fission yeast centromere regions of chromosome 1 or 
3 function as centromeres when introduced into acentric episomes (Hahnenberger et aL PNAS 
USA 86:577-581 (1989)). These results indicate that either fission yeast centromeres are large 
composite structures that cannot be subdivided, or the functional fission yeast centromere 
element has not yet been identified. 

In contrast to the detailed studies done in S. cerevisiae and S. pombe, in most eukaryotes. 
only limited information is available regarding the organization of the centromeres. For 
example, limited information is known about plant centromeres. Peacock et al. ? Proc. Natl 
Acad. Sci. USA. 78:4490-4494 (1981 ) report the first isolation of a repetitive DNA element from 
maize knobs. This repetitive DNA element acts as neocentromeres in certain genetic 
backgrounds. A repetitive DNA element has also been cloned from the centromeres of the 
supernumerary B chromosomes of maize (see Alfenito, MR., et al., Genetics 135:589-597 
(1993) and Kaszas, E., et al., EMBOJ., 15:5246-5255 (1996)). Part of this B-specific DNA 
element shows strong homology to the maize sequences. A 1 80-bp tandem repeat (pALl family) 
is the major component of the centromeric region of Arab idopsis thaliana chromosomes. The 
genomic organization of this repeat family shares similarities to the alpha satellite DNA at the 
human centromeres (see Martinez-Sapater, J., et al., Moi Gen. Genets 204:417-423 (1986); 
Simoens, C. R. ? Nucleic Acids Res. t 16:6753-6766 (1988); Maluszynska, J., et aL Plant J.. 
1:159-166 (1991): Round, E.K., et al., Genome Res., 7:1045-1053 (1997)). 

As discussed above, very few putative functional centromeres have been cloned from 
plants. The cloning of a putative functional centromere from a plant is a necessary first step in 
the production of artificial chromosomes suitable for use in plants. Artificial chromosomes are 
man-made linear or circular DNA molecules constructed from essential cis-acting DNA sequence 
elements that are responsible for the proper replication and partitioning of natural chromosomes 
(see. Murray et al., Nature, 305:189-193 (1983)). The essential elements of an artificial 
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chromosome are: Autonomous Replication Sequences (ARS) (have properties of replication 
origins, which are the sites for initiation of DNA replication), (2) centromeres (site of 
kinetochore assembly and responsible for proper distribution of replicated chromosomes at 
mitosis and meiosis), and (3) telomeres (specialized structures at the ends of linear chromosomes 
that function to stabilize the ends and facilitate the complete replication of the extreme termini of 
the DNA molecule). The use of artificial chromosomes as an alternative to commonly used 
method of introducing new genetic information into cells is steadily increasing. 

Summary of the Invention 

The present invention relates to isolated and purified nucleic acids having the nucleotide 
sequences shown in: SEQ ID NO:l, SEQ ID NO:2 ? SEQ ID NO:3, SEQ ID NO:4 ? SEQ ID NO:5 
SEQ ID NO:6 and SEQ ID NO:7. 

The present invention also relates to a recombinant DNA construct which contains a rice 
centromere. The centromere contains a number of highly repetitive regions of DNA that have the 
nucleotide sequence of SEQ ID NO:l, SEQ ID NO:2, SEQ ID NO:3, SEQ ED NO:4, SEQ ID 
NO:5, SEQ ID NO:6 and SEQ ID NO:7 or combinations thereof. The recombinant DNA 
construct may also contain a yeast autonomous replication sequence, an autonomous replication 
sequence from a higher eukaryotic organism, a yeast telomere sequence or a telomere sequence 
from a higher eukaryotic organism and a selectable marker gene. 

The present invention also relates to a plasmid containing the hereinbefore described 
DNA construct. This plasmid may contain an origin of replication and a selectable marker which 
functions in bacteria (such as E. coli) or in yeast (such as S. cerevisiae). . 

The present invention relates to a plant artificial chromosome vector. The plant artificial 
chromosome vector of the present invention contains an autonomous replication sequence, two 
telomere sequences, a centromere sequence having the nucleotide sequence of SEQ ID NO: 1 , 
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SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6 and SEQ ID NO:7 or 
a combination thereof, and at least one selectable marker sequence. The autonomous replication 
sequence may be from yeast or from a higher eukaryotic organism and the telomere sequence 
may from yeast or from a higher eukaryotic organism, such as, but not limited to, Arabidopsis 
thaliana. 

The present invention also relates to a plant cell transformed with the plant artificial 
chromosome vector hereinbefore described and to transgenic plants containing said plant cell. 
The plant cell and plant may be from Oryza sativa. 

Finally, the present invention relates to a method of identifying centromeric DNA in a 
higher eukaryotic organism. The method involves hybridizing an isolated nucleic acid selected 
from the group consisting of SEQ ID NO:l. SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ 
ID NO:5, SEQ ID NO:6 and SEQ ID NO:7 and combinations thereof with a sample of DNA 
from a higher eukaryotic organism and then identifying and isolating the centromeric DNA from 
said sample. 

Brief Description of the Drawings 

FIG. 1 shows the nucleotide sequence of pSau3 A9. 

FIG. 2 A - FIG. 2N shows the FISH analysis of rice centromeric DNA elements. The 
probes were biotinylated and hybridized in situ to rice chromosomes or DNA fibers. The probes 
were detected by fluorescein isothiocyanate-conjugated antibodies (green color) and the 
chromosomes were stained with propidium iodide (red color). Probes pRCSl hybridized 
exclusively to the centromeric regions of the chromosomes from rice (FIG. 2A), rye (FIG.2B), 
barley (FIG. 2C), sorghum (FIG. 2D), and maize (FIG. 2E). FISH signals also were detected in 
the centromeric regions of the acrocentric B chromosomes (see arrows) from rye (FIG. 2B) and 
maize (2E). Similarly, rice centromeric DNA families RCH2 (FIG. 2F), RCHI (FIG. 2G), RCH3 
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(FIG. 2H), RCE1 (FIG. 21), RCE2 (FIG. 2J) 5 and RCS2 (FIG. 2K) all were located in the 
centromere of every rice chromosome. Two pairs of chromosomes with the strongest signals are 
indicated by arrows and the third pair with the weakest signals by arrowheads (FIG. 2K). The 
same metaphase cell (FIG. 2K) was washed under medium (FIG. 2L) and high strigencies (FIG. 
2M), and most signals were still discernible (FIG. 2N). The marked array between two arrows 
is 51 Mm long and represents approximately 151-kb DNA. All bars are 10 tim. 

FIG. 3 shows a Southern blot of the genomic organization of the RCS1 family. Rice 
genomic DNA was digested with Sau3 AI (lane 1), Dpnll (lane 2), HaelU (lane 3), Mspl (lane 4), 
Hpall (lane 5), Sail (lane 6), BamUl (lane 7), Dral (lane 8), EcoRl (lane 9), and Hindlll (lane 10) 
and probed with pRCS 1 . 

FIG 4 shows the nucleotide sequence of pRCS2. The 639-bp insert of pRCS2 contains 
four copies of a tandemly arranged repeat. The four members (A-D) range from 155 to 165 bp 
and share 84-91% sequence identify with one another. F represents the consensus sequence of 
the four members. 

FIG. 5 shows a Southern blot of the genomic organization of the RCS2 family. Rice 
genomic DNA was digested with Dpnll (lane 1), Sau3AI (lane 2), Mspl (lane 3), Hpall (lane 4) ; 
Sail (lane 5), and HaelU (lane 6), and probed with pRCS2. 

FIG. 6A - FIG. 6B show a Southern blot of the conservation of the RCH1 and RCE1 
families in Gramineae species. Genomic DNA from sorghum (lane 1), maize (lane 2), sugar 
cane (lane 3), Ag. intermedium (lane 4), barley (lane 5), oats (lane 6), rye (lane 7), wheat (lane 8), 
Ae. Squarrosa (lane 9), rice (lane 10), bamboo (lane 1 1), Pharus sp. (lane 12), J. effusus (lane 
1 3), C. alternifolius (lane 14) and A. thaliana (lane 1 5) was digested with Hindlll and probed 
with pRCHl (FIG. 6A) and pRCEl (FIG. 6B). 

FIG. 7 shows the nucleotide sequence of RCS1 . 
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FIG. 8 shows the nucleotide sequence of RCS2. 

FIG. 9 shows the nucleotide sequence of RCH1. 
FIG. 10 shows the nucleotide sequence of RCH2. 
FIG. 1 1 shows the nucleotide sequence of RCH3. 
FIG. 12 shows the nucleotide sequence of RCE1. 
FIG. 13 shows the nucleotide sequence of RCE2. 
Detailed Description of the Invention 



15 Background 

The present invention relates to cloned centromeric DNA from Oryza sativa (rice). More 
specifically, the inventors of the present invention have discovered that the cloned centromeric 
DNA of the present invention contains seven (7) different repetitive regions of complex DNA. 
These seven (7) repetitive regions are referred to herein as follows: RCSK RCS2, RCH1. RCH2, 
20 RCH3,RCE1 andRCE2. 

The present invention relates to isolated and purified nucleic acids for each of the seven 
(7) different repetitive regions of centromeric DNA from Oryza sativa. The nucleic acids of the 
present invention encode a functional centromere from Oryza sativa. 



The present invention further relates to the use of the nucleic acids of the present 
invention as primers and probes to identify centromeric DNA from other plants and animals. In 
addition, the nucleic acid sequences disclosed herein can be used to create a plant artificial 
chromosome vector. 
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Definitions 

Units, prefixes, and symbols can be denoted in the SI accepted form. Numeric ranges arc 
inclusive of the numbers defining the range. Unless otherwise indicated, nucleic acids are 
written left to right in 5' to 3 J orientation, respectively. The headings provided herein are not 
5 limitations of the various aspects or embodiments of the invention which can be had by reference 
to the specification as a whole. Accordingly, the terms defined immediately below are more fully 
defined by reference to the specification as a whole. 

As used herein, the term "plant" includes reference to whole plants, plant organs (e.g.. 
1 0 leaves, stems, roots, etc.), seeds and plant cells and progeny thereof. The class of plants which 
can be used in the methods of the present invention are generally as broad as the class of higher 
plants amenable to transformation techniques, including both monocotyledonous and 
dicotyledonous plants. 

1 5 As used herein, the term "transformation ' or "transfection" means the acquisition in cells 

of new DNA sequences through incorporation of added DNA. This is the process by which 
naked DNA, DNA coated with protein, or whole artificial chromosomes are introduced into a 
cell, resulting in a heritable change. 

20 As used herein, the term "host" means any organism that is the recipient of a replicable 

plasmid or vector comprising a plant artificial chromosome. Preferably, host strains used for 
cloning are free of any restriction enzyme activity that might degrade the foreign DNA used. 
Preferred examples of host cells for cloning which are useful in the present invention arc 
bacteria, such as Escherichia coll Bacillus subtilis, Pseudomonas, Streptomyces, Salmonella, 

25 and yeast cells such as S. cerevisiae. Host cells which can be targeted for expression of a plant 
artificial chromosome may be plant cells of any source, such as, but not limited to, Arabidopsis, 
maize, rice, sugarcane, sorghum, barley, soybeans, tobacco, wheat, tomato, potato or citrus. 

As used herein, the term "linker" means a DNA molecule, generally up to 50 or 60 
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nucleotides long and synthesized chemically, or cloned from other vectors. 

As used herein, the term "plasmid" or Sector" (such as a cloning vector or expression 
vector)" refers to a closed covalently circular extrachrornosomal DNA or linear DNA which is 
able to autonomously replicate in a host cell and which is normally nonessential to the survival of 
the cell. A wide variety of plasmids and other vectors are well known and commonly used in the 
art. 

As used herein, "heterologous" when used to describe nucleic acids or polypeptides refers 
to nucleic acids or polypeptides that originate from a foreign species, or, if from the same 
species, are substantially modified from their original form. For example, a promoter operably 
linked to a heterologous structural gene is from a species different from that from which the 
structural gene was derived, or, if from the same species, one or both are substantially modified 
from their original form. 

As used herein, "isolated" includes reference to material which is substantially or 
essentially free from components which normally accompany or interact with it as found in its 
naturally occurring environment. The isolated material optionally comprises material not found 
with the material in its natural environment. 

As used herein, "nucleic acid" includes reference to a deoxyribonucleotide or 
ribonucleotide polymer in either single- or double-stranded form, and unless otherwise limited, 
encompasses known analogues of natural nucleotides that hybridize to nucleic acids in a manner 
similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid 
sequence includes the complementary sequence thereof. 

As used herein, "operably linked" includes reference to a functional linkage between a 
promoter and' a second sequence, wherein the promoter sequence initiates and mediates 
transcription of the DNA sequence corresponding to the second sequence. Generally, operably 



WO 01/00858 



PCT/USOO/17535 



linked means that the nucleic acid sequences being linked are contiguous and, where necessary to 
joint two protein coding regions, contiguous and in the same reading frame. 

As used herein "recombinant" includes reference to a cell, or nucleic acid, or vector, that 
has been modified by the introduction of a heterologous nucleic acid or the alteration of a native 
nucleic acid to a form not native to that cell, or that the cell is derived from a cell so modified. 
For example, recombinant cells express genes that are not found within the native (non- 
recombinant) form of the cell or express native genes that are otherwise abnormally expressed, 
under expressed or not expressed at all. 

As used herein, a "recombinant DNA construct" is a nucleic acid construct, generated 
recombinantly or synthetically, with a series of specified nucleic acid elements which permit 
transcription and translation of a particular nucleic acid in a target cell. The DNA construct can 
be part of a plasmid, vector, virus, or nucleic acid fragment Typically, the recombinant DNA 
construct portion of the construct includes a nucleic acid to be transcribed and translated, and a 
promoter, in the present invention, the recombinant DNA construct can be a plant artificial 
chromosome. 

As used herein, "transgenic plant" includes reference to a plant modified by introduction 
of a heterologous nucleic acid. 

As used herein, 'telomere" refers to the end of a chromosome comprising a simple repeat 
DNA. The function of a telomere is to allow the ends of a linear DNA molecule to be replicated. 

As used herein, "eukaryote" refers to living organisms whose cells contain nuclei. A 
eukaryote may be distinguished from a "prokaryote" which is an organism which lacks nuclei. 
Prokaryotes and eukaryotes differ fundamentally in the way their genetic information is 
organized, as well as their patterns of RN A and protein synthesis. 
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As used herein, ' lower eukaryote" refers to a eukaryote characterized by a comparatively 
simple physiology and composition, and unicellularity. Examples of lower eukaryotes include 
flagellates, ciliates. and yeast. 

5 As used herein, "higher eukaryote" refers to a multicellular eukaryote, characterized by 

its greater complex physiological mechanisms as well as its ability to interact with its 
environment in a more sophisticated manner. Generally, more complex organisms such as plants 
and animals are included in this category. Preferred higher eukaryotes to be transformed by the 
present invention include, for example, monocot and dicot angiosperm species, gymnosperm 

10 species, fern soecies, plant tissue culture cells of these species, and algal cells. It will of course be 
understood that prokaryotes and eukaryotes alike may be transformed by the methods of this ■ 
invention. 

As used herein, a "selectable marker"' is a gene whose presence results in a clear 
1 5 pheontype, and most often a growth advantage for cells that contain the marker. This growth 
advantage may be present under standard conditions, altered conditions such as elevated 
temperature, on in presence of certain chemicals such as herbicides or antibiotics. Examples of 
selectable markers include the thymidine kinase gene, the cellular adenine- 
phosphoriboysltransferase gene and the dihydrylfolate reducast gene, hygromycin 
20 phosphotransferase genes, the bar gene and the neomycin phosphotransferase genes, among 
others. Preferred selectable markers in the present invention include genes whose expression 
confer antibiotic or herbicide resistance to the host cell, sufficient to enable the maintenance of a 
vector with a host cell, and which facilitate the manipulation of a plasmid into new host cells. 



25 



As used herein, "nucleotide" refers to one of the monomeric units from which DNA or 
RNA polymers are constructed, consisting of a purine or pyrimidine base, a pentose, and a 
phosphoric acid group. The nucleotides of DNA are deoxyadenylic acid, thymidylic acid, 
deoxyguanilic acid, and deoxycytidylic acid. The corresponding nucleotides of RNA are 
adenylic acid, uridylic acid, guanylic acid, and cytidylic acid. 
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SEQUENCE LISTING 

The present application also contains a sequence listing that contains 8 sequences. The 
sequence listing contains nucleotide sequences. For the nucleotide sequences, the base pairs are 
represented by the following base codes: 



Svmbol 


Meaning 


A 


A; adenine 


C 


C; cytosine 


u 


G* guanine 


T 


T; thymine 


U 


U; uracil 


M 


A orC 


R 


A orG 


W 


A or T/U 


S 


CorG 


Svmbol 


Meaning 


Y 


Cor TAJ 


K 


GorT/U 


V 


A or C or G; not T/U 


H 


A or C or T/U; not 0 


D 


A or G or T/U; not C 


B 


CorG or T/U; not A 


N 


(A orCorGor T/U) 



Nucleic Acid Sequences 

In one embodiment, the present invention relates to isolated and purified nucleic acids 
> encode a functional centromere from Oryza saliva. As used here.n, the term " a functional 



chromosome site that directs or supports kinetechore 



which ( 

centromere" refers to the centromere or 
formation. The kinetochore is the physical structure that mediates the attachment of the spindle 
fibers to the chromosome and is therefore responsible for the proper partition of the 
chromosomes at mitosis and meiosis. 
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The nucleic acids of the present invention encode seven (7) different repetitive regions of 
centromeric DNA from Oryza sativa; Exemplary nucleic acids for such centromeres have the 
nucleotide sequences shown in SEQ ID NO:l, SEQ ID NO:2 r SEQ ID NO:3, SEQ ID NO:4, 
SEQ ID NO:5, SEQ ID NO:6 and SEQ ID NO:7 or combinations thereof. SEQ ID NO:1 is also 
referred to herein as RCSL SEQ ID NO:2 is also referred to herein as: RCS2. SEQ ID NO:3 is 
also referred to herein as RCH1 . SEQ ID NO:4 is also referred to herein as RCH2. SEQ ID 
NO:5 is referred to herein as RCH3. SEQ ID NO:6 is referred to herein as RCE1 . SEQ ID NO:7 
is referred to herein as RCE2. 

The present invention also contemplates nucleic acids which hybridize under stringent 
hybridization conditions to the nucleotide sequences set forth above. Generally, stringent 
conditions are selected to be about 5°C to about 20 C C lower than the thermal melting point (Tm) 
for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under 
defined ionic strength and pH 7) at which 50% of the target sequence hybridizes to a perfectly 
matched probe. Typically, stringent wash conditions are those in which the salt concentration is 
about 0.22 molar at pH 7 and the temperature is at least about 50°C. However, nucleic acids 
which do not hybridize to each other under stringent conditions are still substantially identical if 
it encodes a substantially identical and functional centromere. This may occur, e.g., when a copy 
of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. 
The present invention also contemplates naturally occurring allelic variations and mutations of 
the nucleotide sequences set forth above so long as those variations and mutations code, on 
expression, for a functional centromere. 

As is well known in the art, because of the degeneracy of the genetic code, there are 
numerous other DNA and RNA molecules that can code for the same functional centromere as 
encoded by SEQ ID NOS: 1-7, or portions thereof. The present invention, therefore, 
contemplates those other DNA and RNA molecules, which, on expression, encode for a 
functional centromere encoded by the nucleic acid sequences of SEQ ID NOS: 1-7 or portions 
thereof. With knowledge of all triplet codons for each particular amino acid residue, it is 
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possible to describe all such encoding RNA and DNA sequences. DNA and RNA molecules 
other than those specifically disclosed herein and, which molecules are characterized simply by a 
change in a codon for a particular amino acid are within the scope of this invention. A table of 
codons representing particular amino acids is set forth below in Table 1. 

TABLE 1 

<ii*r nnd Position Third Position 

First Position becond rosmon 

(5 1 end) 



T/U 



T/U 


C 


A 


(J 




Phe 


Ser 


Tyr 


Cys 


T/U 


Phe 


Ser 


Tyr 


Cys 


C 


Leu 


Scr 


Stop 


Stop 


A 


Leu 


Ser 


Stop 


Stop 


G 


Leu 


Pro 


His 


Arg 


T/U 


Leu 


Pro 


His 


Arg 


C 


Leu 


Pro 


Gin 


Arg 


A 


Leu 


Pro 


Gin 


Arg 


G 


lie 


Thr 


Asn 


Ser 


TAJ 


lie 


Thr 


Asn 


Ser 


C 


He 


Thr 


Lys 


. Arg 


A 


Met 


Thr 


Lys 


Arg 


G 



Val 


Ala 


Asp 


Gly 


Val 


Ala 


Asp 


Gly 


Val 


Ala 


Glu 


Gly 


Val 


Ala 


Glu 


Gly 



T/U 
C 
A 
G 



The nucleic acid sequences of the present invention can be used in marker-aided selection 
using techniques which are well-known in the art. Marker-aided selection does not require the 
complete sequence of the gene or precise knowledge of which sequence confers which 
specified instead, partial sequences can be used as hybridization probes or as the basis tor 
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oligonucleotide primers to amplify by PCR or other methods to identify nucleic acids specific for 
functional centromeric DNA in other plants and animals. 



Plant Artificial Chromosome 
5 In a second embodiment, the present invention relates to a plant artificial chromosome. 

More specifically, the nucleic acid sequences of the present invention can be used to construct a 
plant artificial chromosome vector. A plant artificial chromosome must contain the following 
essential elements: (1) autonomous replication sequences (hereinafter referred to as "ARS"), (2) 
a centromere which is functional in plants, and (3) telomeres which are functional in plants. 

10 

Autonomous Replication Sequences 

ARSs have been isolated from the unicellular fungi Saccharomyces cerevisiae (brewer's 
veast) and Schizosaccharomyces pombe (see Stinchcomb et aL Nature 282:39- 43 (1979) and 
Hsiao, et aL J. Proc. Natl. Acad Set USA 76:3829-3833 (1979)). ARSs behave Like replication 
1 5 origins allowing DNA molecules that contain the ARS to be replicated as an episome after 

introduction into the cell nuclei of these fungi. Although plasmids containing these sequences 
replicate, they do not segregate properly. 

U.S. Patent 5,270,201 (hereinafter the im '201 Patent"), hereby incorporated by reference, 
20 discloses a method for isolating ARS sequences for use in higher eukaryotic organisms, by the 
formation of minichromosomes derivative of natural chromosomes. It has been demonstrated in 
yeast that inverted repeats of telomeric sequences are "resolved" by an unknown mechanism 
which results in a double-stranded cleavage between inverted repeats. After an inverted telomere 
repeat is introduced into a chromosome, a resolution reaction will lead to scission of the 
25 chromosome and formation of two chromosomal fragments, each with two telomeres. This 
process generates a minichromosome small enough to be isolated intact allowing further 
manipulation by in vitro techniques to delimit the sequences required for autonomous replication. 
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A second approach for an obtaining ARS is also disclosed in the fc 201 Patent This 
approach is referred to as a "shotgun cloning approach". Higher eukaryotic organisms have 
many replication origins distributed throughout their genomes. For example, the A. thaliana 
genome contains approximately 1000 origins spaced every 70 kb along the chromosome. 
Therefore, the shotgun cloning approach involves looking for random fragments of genomic 
DNA throughout the genome of interest which promote extrachromosomal replication. 

Autonomous replication sequences for use in the plant artificial chromosome of the 
present invention can be obtained using methods which are well known in the art. Autonomous 
replication sequences from yeast, such as those described above, can be used in the present 
invention. Moreover, ARS sequences from higher eukaryotic organisms obtained using the 
methods described in the 4 201 patent can also be used in plant artificial chromosome of the 
present invention. 

Telomeres 

Telomeres are believed to be involved in the priming of DNA replication at the 
chromosome end (see, Blackburn et at., Ann. Rev. Biochem. 53: 163-194 (1984)). This is because 
conventional DNA polymerases are template dependent, synthesize DNA in the 5' to 3' direction, 
and require an oligonucleotide primer to donate a 3' OH group. When this primer is removed, 
unreplicated single-stranded gaps arise; most of these gaps can be filled in by priming from 3' 
OH groups donated by newly replicated strands located at the 5' end of the gap. However, the 
unreplicated gaps which lie next to the extreme 5' end of the DNA duplex cannot be primed in 
this manner. Consequently, telomeres must provide an alternative priming mechanism. 

Telomeres are also responsible for the stability of chromosomal termini. Telomeres act as 
"caps," suppressing the recombinogenic properties of free, unmodified DNA ends (see Blackburn 
et al., Ann. Rev. Biochem. 53:163-194 (1984)). This reduces the formation of damaged and 
rearranged chromosomes which arise as a consequence of recombination-mediated chromosome 
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fusion events. 

Telomeres may also contribute to the establishment or maintenance of intranuclear 
chromatin organization through their association with the nuclear envelope (see, for example, 
Fussell, C. P., Genetica 62:192-201 (1984)). 

Telomeric or telomeric-like DNA sequences have been cloned from several lower 
eukaryotic organisms, principally protozoans and yeast. The ends of the Tetrahymena linear 
DNA plasmid have been shown to function like a telomere on linear plasmids in S. cerevisiae 
(see Szostak, J. W., Cold Spring Harbor Symp. Quant Biol 47:1 187-1 194 (1983)). A telomere 
from the flagellate Trypanosoma has been cloned (see, for example, Blackburn et al., Cell 
36:447-457 (1984)). A yeast telomeric sequence has been identified (see, for example, Shampay 
et al., Nature 3 1 0:154-157 (1984)). 

U.S. Patent 5,270,201 disclose a method for obtaining a telomere from a higher 
eukaryotic organism, specifically, from Arahidopsis thaliana. The telomeric sequences disclosed 
in the '201 Patent contain a tandem repeat of the sequence 5-CCCTAAA-3. 

Any telomeric sequence which produces a telomere which is functional in plants can be 
inserted into the plant artificial chromosome of the present invention. The telomeric sequence 
may be from yeast or from a higher eukaryotic organism as described above. Preferably, the 
plant artificial chromosome of the present invention will contain two (2) telomeric sequences. 

Construction of a Plant Artificial Chromosome 

Once the essential elements of a plant artificial chromosome are obtained (the ARS, 
centromere and teleomeres), a plant artificial chromosome vector can be constructed using 
methods which are well-known in the art (see, for example, Maniatis, T., et al., Molecular 
Cloning: A Laboratory Manual (Cold Spring Harbor, 1982)). 
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In addition to the essential elements described above, preferably positive and negative 
selectable plant markers (for example, antibiotic or herbicide resistance genes), and a cloning site 
for insertion of foreign DNA are preferably included. In order to propagate vectors in £. coli.. it 
is necessary to convert the linear molecule into a circle by the addition of a stuffer fragment 
between the telomeres. In addition to the stuffer fragment, the artificial plant chromosome may 
also contain a origin of replication that can function in plants. 

Artificial plant chromosomes which replicate in yeast also may be constructed to take 
advantage of the large insert capacity and stability of repetitive DNA inserts afforded by this 
system (Burke et al., Science, 236:806-812 (1987)). In this case, yeast ARS and centromere 
sequences are added to the artificial chromosome. The artificial chromosome is maintained in 
yeast as a circular molecule using a stuffer fragment to separate the teleomeres. 

Nucleic acids for the essential components of the plant artificial chromosome obtained 
from any source whatsoever, may be purified and inserted into the plant artificial chromosome at 
any appropriate restriction endonuclease cleavage site. The nucleic acids usually will contain 
various regulatory- signals (for example, promoters, termination segments, enhancers, etc., which 
are well known in the an) that allow for the expression of proteins encoded by the nucleic acids. 
Alternatively, regulatory signals residing in the artificial chromosome may be utilized. 

The techniques and procedures required to accomplish insertion are well-known in the art 
(see Maniatis et al, Molecular Cloning: A Laboratory Manual Cold Spring Harbor Laboratory, 
Cold Spring Harbor, N.Y., (1982)). Typically, this is accomplished by incubating a circular 
plasmid or a linear DNA fragment in the presence of a restriction endonuclease such that the 
restriction endonuclease cleaves the DNA molecule, Endonucleases preferentially break the 
internal phosphodi ester bonds of polynucleotide chains. They may be relatively unspecific, 
cutting polynucleotide bonds regardless of the surrounding nucleotide sequence. However, the 
endonucleases which cleave only a specific nucleotide sequence are called restriction enzymes. 
Restriction endonucleases generally internally cleave DNA molecules at specific recognition 
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sites, making breaks within "recognition" sequences that in many, but not all. cases exhibit two- 
fold symmetry around a given point. Such enzymes typically create double-stranded breaks. 

Many of these enzymes make a staggered cleavage, yielding DNA fragments with 
protruding single-stranded 5' or 3' termini. Such ends are said to be "sticky" or "cohesive" 
because they will hydrogen bond to complementary 3' or 5' ends. As a result, the end of any 
DNA fragment produced by an enzyme, such as EcoRl, can anneal with any other fragment 
produced by that enzyme. This properly allows splicing of foreign genes into plasmids, for 
example. Some restriction endonucleases that may be particularly useful with the current 
invention include HmcUll, Pstl, EcoRl, and BamHl 

Some endonucleases create fragments that have blunt ends, that is, that lack any 
protruding single strands. An alternative way to create blunt ends is to use a restriction enzyme 
that leaves overhangs, but to fill in the overhangs with a polymerase, such as klenow, thereby 
resulting in blunt ends. When DNA has been cleaved with restriction enzymes that cut across 
both strands at the same position, blunt end ligation can be used to join the fragments directly 
together. The advantage of this technique is that any pair of ends may be joined together, 
irrespective of sequence. 

Those nucleases that preferentially break off terminal nucleotides are referred to as 
exonucleases. For example, small deletions can be produced in any DNA molecule by treatment 
with an exonuclease which starts from each 3' end of the DNA and chews away single strands in 
a 3' Lo 5' direction, creating a population of DNA molecules with single-stranded fragments at 
each end, some containing terminal nucleotides. Similarly, exonucleases that digest DNA from 
the 5 1 end or enzymes that remove nucleotides from both strands have often been used. Some 
exonucleases which may be particularly useful in the present invention include ita/31, SI, and 
ExolU. These nucleolytic reactions can be controlled by varying the time of incubation, the 
temperature, and the enzyme concentration needed to make deletions. Phosphatases and kinases 
also may be used to control which fragments have ends which can be joined. Examples of useful 
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phosphatases include shrimp alkaline phosphatase and calf intestinal alkaline phosphatase. An 
example of a useful kinase is T4 polynucleotide kinase. 

Once the source DNA sequences and vector sequences have been cleaved and modified to 
generate appropriate ends they are incubated together with enzymes capable of mediating the 
ligation of the two DNA molecules. Particularly useful enzymes for this purpose include T4 
ligase, E. coli ligase, or other similar enzymes. The action of these enzymes results in the sealing 
of the linear DNA to produce a larger DNA molecule containing the desired fragment (see, for 
example, U.S. Pat. Nos. 4,237,224; 4,264,731; 4,273,875; 4,322,499 and 4,336,336, which are 
specifically incorporated herein by reference). 

It is to be understood that the termini of the linearized plasmid and the termini of the 
DNA fragment being inserted must be complementary or blunt in order for the ligation reaction 
to be successful. Suitable complementarity can be achieved by choosing appropriate restriction 
endonucleases (i.e., if the fragment is produced by the same restriction endonuclease or one that 
generates the same overhang as that used to linearize the plasmid, then the termini of both 
molecules will be complementary). As discussed previously, in a preferred embodiment, at least 
two classes of the vectors used in the present invention are adapted to receive the foreign 
oligonucleotide fragments in only one orientation. After joining the DNA segment to the vector, 
the resulting hybrid DNA can then be selected from among the large population of clones or 
libraries. 

A method useful for the molecular cloning of DNA sequences includes in vitro joining of 
DNA segments, fragmented from a source high molecular weight genomic DNA, to vector DNA 
molecules capable of independent replication. The cloning vector may include plasmid DNA 
(see Cohen et al, Proc. Natl. Acad Sci. USA, 70:3240 (1973)), phage DNA (see Thomas et al„ 
Proc. Natl. Acad ScL USA, 71:4579 (1974)), SV40 DNA (see Nussbaum et al, Proc. Natl. 
Acad ScL USA, 73:1068 (1976)), yeast DNA, E coli DNA and most significantly, plant DNA. 
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A variety of processes are known which may be utilized to effect transformation; i.e., the 
inserting of a heterologous DNA sequences into a host celL whereby the host becomes capable of 
efficient expression of the inserted sequences. 

Transformed Host Cells and Transgenic Plants 

Methods and compositions for transforming a bacterium, a yeast cell, a plant cell, or an 
entire plant with one or more plant artificial chromosome vectors are further aspects of the 
present invention. 

Means for transforming bacteria and yeast cells are well known in the art. Typically, 
means of transformation are similar to those well known means used to transform other bacteria 
or yeast such as E. coli or Saccharomyces cerevisiae. Methods for DNA transformation of plant 
cells include Agrobacterium-mediated plant transformation, protoplast transformation, gene 
transfer into pollen, injection into reproductive organs, injection into immature embryos and 
particle bombardment. There are various advantages and disadvantages associated with each of 
these methods. 

Methods for transforming plant cells include any method by which DNA can be 
introduced into a cell, such as by Agrobacterium infection, direct delivery of DNA such as, by 
PEG-mediated transformation of protoplasts, by desiccation/inhibition-mediated DNA uptake, by 
electroporation, by agitation with silicon carbide fibers, by acceleration of DNA coated particles, 
etc. 

Many methods for delivering genes into cells are known and well described. These 
methods include: (1) chemical methods (Graham et al., Virology, 54(2):536-539 (1973); 
Zatloukal et al. Ann. N. Y. Acad. Sci., 660:136-153 (1992)); (2) physical methods such as 
microinjection (Capecchi, Cell 22(2):479-488 (1980)), electroporation (Wong et al., Biochim. 
Biophys. Res. Commun : 107(2):584-587 (1982); Fromm et al., Proc. Natl Acad Sci. USA, 
82(17):5824-5828 (1985); U.S. Patent 5,384,253) and microprojectile bombardment (i.e. the 
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gene gun) (Johnston et al., Methods Cell BioL, 43(A):353-365 (1994); Fynan et al., Proc. Natl. 
Acad. Sci. USA, 90(24): 1 1478-1 1482 (1993); (3) viral vectors (Clapp, Clin. PerionatoL, 
20(1):155-168 (1993); Luet al.,,/. Exp. Med, 178(6):2089-2096 (1993)); Eglitis et al. 
Biotechniques, 6(7):608-614 (1988); Eglitis ct al. Avd. Exp. Med BioL, 241:19-27 (1988); and 
(4) receptor-mediated mechanisms (Curiel et al., Proc. Natl Acad. Sci. USA, 88(19):8850-8854 
(1991); Curiel et al., Hum. Gen. Ther. 9 3(2):147-154 (1992); Wagner et al., Proc. Natl. Acad Sci. 
USA 89(13):6099-6103 (1992)). 

Agrobacterium-mediztcd transforation is a widely applicable system for introducing 
genes into plant cells because the DNA can be introduced into whole plant tissues. The use of 
Agrobaaerium-mediaied plant integrating vectors to introduce DNA into plant cells is well 
known in the art. Using conventional transformation vectors, chromosomal integration is 
required for stable inheritance of the foreign DNA. However, the artificial plant chromosome 
vector described herein may be used for transformation with or without integration, as the 
centromere function required for stable inheritance is encoded within the plant artificial 
chromosome. In particular embodiments, transformation events in which the plant artificial 
chromosome is not chrornosomally integrated may be preferred, in that problems with site- 
specific variations in expression and insertional mutagenesis may be avoided. 

The integration of the Ti-DNA is a relatively precise process resulting in few 
rearrangements. The region of DNA to be transferred is defined by the border sequences, and 
intervening DNA is usually inserted into the plant genome as described (Spielmann et al.,Mol 
Gen. Genet., 205:34 (1986); Jorgensen et al, Mol Gen. Genet., (1987)). Modern Agrobacterium 
transformation vectors are capable of replication in E. coli as well as Agrobacterium, allowing 
for convenient manipulations as described (Klee et al, Bio/Technology, 3:637-642(1985)). 
Moreover, recent technological advances in vectors for Agrobacterium-mediaied gene transfer 
have improved the arrangement of genes and restriction sites in the vectors to facilitate 
construction of vectors capable of expressing various polypeptide coding genes. The vectors 
described (Rogers et al, Metk In Enzymol, 153:253-277 (1987)), have convenient multi-linker 
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regions flanked by a promoter and a polyadenylation site for direct expression of inserted 
polypeptide coding genes and are suitable for present purposes. In addition. Agrobacterium 
containing both armed and disarmed Ti genes can be used for the transformations. In those plant 
strains where Agrubacterivm-mediated transformation is efficient, it is the method of choice 
because of the facile and defined nature of the gene transfer. 

Agrobacterium-rnedioled transformation of leaf disks and other tissues such as cotyledons 
and hypocotyls appears to be limited to plants that Agrobacterium naturally infects. 
Agrobacterium-mediatcd transformation is most efficient in dicotyledonous plants. Few 
monocots appear to be natural hosts for Agrobacterium, although transgenic plants have been 
produced in asparagus and more significantly in maize using Agrobacterium vectors as described 
(Bytebier e/ al, Proc. Natl. Acad Sci. USA, 84:5345 (1987)); U.S. Patent No. 5,591,616, 
specifically incorporated herein by reference). Therefore, commercially important cereal grains 
such as rice, corn, and wheat must usually be transformed using alternative methods. However, 
as mentioned above, the transformation of asparagus using Agrobacterium also can be achieved 
(see, for example, Bytebier et al,Proc. Nail Acad. Sci. USA, 84:5345 (1987)). 

Other Transformation Methods 

Transformation of plant protoplasts can be achieved using methods based on calcium 
phosphate precipitation, polyethylene glycol treatment, electroporation, and combinations of 
these treatments (see, for example, Potrykus et al, Mol Gen, Genet. 199:1 83 (1985); Lorz et al, 
Mol Gen, Genet., 199:178 (1985); Fromm etal, Nature, 312:791-793 (1986); Uchimiya etaL 
Mol Gen. Genet., 204:204 (1986); Callis et al, Genes and Development, 1:1 1 83 (1987); 
Marcotte et al, Nature, 335:454 (1988)). 

Application of these systems to different plant strains for the purpose of making 
transgenic plants depends upon the ability to regenerate that particular plant strain from 
protoplasts. Illustrative methods for the regeneration of cereals from protoplasts are described 
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(Fujimura et aL Plant Tissue Culture Letters, 2:74 (1985); Toriyama el aL. Theor. AppL Genet.. 
73:16 (1986); Yamada et aL, Plant Cell Rep., 4:85 (1986); Abdullah et aL, Biotechnology, 
4:1087(1986)). 

To transform plant strains that cannot be successfully regenerated from protoplasts, other 
ways to introduce DNA into intact cells or tissues can be utilized. For example, regeneration of 
cereals from immature embryos or explants can be effected as described (Vasil., Biotechnology 
6:397 (1988)). In addition, "particle gun" or high- velocity microprojectile technology can be 
utilized (Vasil et aL, Biotechnology, 10:667-674 (1992)). 

Using that latter technology, DNA is carried through the cell wall and into the cytoplasm 
on the surface of small metal particles as described (Klein et aL. Nature. 327:70-73 (1987): Klein 
et aL Proc. Natl. Acad Sci. USA, 85:8502-8505 (1988): McCabe etaL. Biotechnology, 6:923 
(1988)). The metal particles penetrate through several layers of cells and thus allow the 
transformation of cells within tissue explants. 

By way of example, and not of limitation, Examples of the present invention will now be 

given. 

EXAMPLE 1: CLONING OF CENTROMERIC DNA FROM ORZYA SATIVA 

L Materials and Methods. 

a. A rice BAC library was constructed from an indica rice (Oryza sativa ssp. Indica) line 
IR-BB21 and consisted of 1 1,000 clones (see Wang, G.-L., et al., Plant J. 7: 525-533 (1995), 
herein incorporated by reference). The cereal centromeric DNA element pSau3 A9 (Jiang, J., et 
al., Proc. Natl,. Acad. Sci. USA 93:14210-14213 (1996), herein incorporated by reference) was 
used to isolate the rice centromere-specific BAC clones. The DNA sequence of pSau3A9 is 
shown in Figure 1 and SEQ ID NO:8. Rice lines used in this example include a javanica rice (0. 
sativa ssp. Javanica) line DV85, a japonica rice (O. sativa ssp. Japonica ) line Norm 28, and 
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indica rice line IR72, and four other Oryza species (0. glaberrima, O. rufipogon. O. officinalis, 
and O. alta). Gramineae species used in conservation studies include two species from the 
Bambusoideae subfamily [bamboo {Bambusa vulgaris), Pharus sp.]. three species from the 
Panicoideae subfamily [sorghum, maize (Zea mays), and sugar cane (Saccharum qfficinarum)], 
six species form the Pooideae subfamily [Agropyron intermedium, barley (Hordeum vulgare), 
oats {Avena sativa), rye (Secale cereale), wheat {Triticum aestivum), and Aegilops squarrosa]. 
Three non-Gramineae species, Juncus effusus, Cyperus alternifolius, and A. thaliana, and rye and 
maize lines containing B chromosomes also were included. 

b. BAC Library Screening. BAC filter preparations and BAC library screening were 
conducted as described in Wang, G.-L., et al., Plant J, 7:525-533 (1995); Hoheisel, J. D., et aL 
Cell 73:109-120 (1993), herein incorporated by reference. BAC clones were isolated by using 
pSau3 A9 as a probe, and their cytological locations were confirmed by fluorescence in situ ' 
hybridization (hereinafter referred to as "FISH"). 

c. Subcloning and Sequencing. DNA fragments recovered from agarose gels were 
subcloned into pUC 1 8 plasmids as described in Jiang, J., et al., Proc. Natl,. Acad. Sci. USA 93: 
142 10-14213 (1996), herein incorporated by reference. Cycle sequencing reactions were 
performed by using Applied Biosytems AmpliTaq DNA polymerase, FS Dye Terminator Ready 
Reactions kit, and a Perkin-Elmer Thermocycler (model 2400). Reaction products were analyzed 
on an Applied Biosystems DNA sequencer (model 373). 

d. Southern Blot Hybridization. Plant genomic DNA was isolated as described in Gill, 
K. S. ? et al., Genome 34:362-374 (1991), herein incorporated by reference. BAC DNA was 
prepared by using an alkaline lysis method described in Sambrook, J., Fritsch, E. F. & Maniatis, 
T. (1989) Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Lab. Press, Plain view, 
NY), 2nd Ed., pp. 1.25-1.26, herein incorporated by reference, and purified by CsCl 
ultracentrifugation. Gel transfers, prehybridizations, hybridizations, and posthybridization 
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washing were all as previously described Jiang. J., et ah, Proc. Nad,. Acad Sci. USA 93: 14210- 
14213 (1996). 

c. Slot Blot Hybridization. Copy number of each subclone in rice genome was 
determined by slot blot hybridization as described Zhao, X., et al., Theor. Appl. Genet. 78:201- 
209 (1989), herein incorporated by reference. Band intensities were measured on the 
autoradiographs by IPLab Spectrum v3. 1 software. 

f. FISH. Detailed protocols for FISH and Fiber-FISH are described in Jiang, J. ; et al., 
Proc. Natl.. Acad. Sci. USA 93: 14210-14213 (1996) and Fransz : P. F., et al.. Flam J. 9:421-430 
(1 996). herein incorporated by reference. The formamide in the hybridization mixture was 50% 
and 30% in regular and low stringency hybridizations, respectively. Washing was conducted at 
either low [2 * saline sodium citrate (SSC) at 42°C for 15 minutes], medium (50% formamide at 
45 °C for 15 minutes) or high stringency (70% formamide at 50 °C for 15 minutes). 

2. A rice BAC library constructed from indica rice (Oryza saliva ssp. Indica) and described by 
Wang, G.-L., et aL Plant J. 7: 525-533 (1995), was screened by using pSau3A9 as a probe. 
Twenty-two clones showed unambiguous positive hybridizations. Ten of the 22 clones were 
analyzed cytologically by FISH. Eight clones hybridized to the centromeric or/and 
paracentromeric regions of all rice chromosomes. Clone 1 7p22 showed bright and sharp signals 
specific to the centromeric regions. At a low hybridization stringency, this clone also hybridized 
exclusively to the centromeric regions of chromosomes from sorghum, maize, wheat, barley, oats 
and rye. 

DNA from clone 17p22 was digested with restriction enzymes, BamW\, Dra\, EcoRl, 
HaeUL HindlR, Msph Pstl, Sau3Al and Sail and blotted onto nylon membrane. Small DNA 
fragments ranging from 0.5 to 3 kb were subcloned, and their distinctiveness was confirmed by 
Southern hybridization using blots containing 17p22 DNA digested with the above described 
nine restriction enzymes. Seven different DNA families., including two Sau3 Al fragments 
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(subclones pRCSl and pRCS2), three Hindlll fragments (subclones pRCHK pRCH2 and 
pRCH3), and two EcoRl fragment (subclones pRCEl and pRCE2), were identified (see below in 
Table 2). These seven families hybridized to all of the fragments generated by the nine enzymes. 
FISH and Southern hybridization analysis indicated that all seven elements are repetitive in the 
rice genome (see below). 

Table 2 

Summary of the Seven Rice Centromeric Repetitive DNA Families 



Family 


SEQ ID NO: 


Size, 


GC content. 


Organization 


Copy 


Conservation 






bp 


% 


pattern 


number* 




RCS1 


' ! 1 


1478 


40 


Dispersed 


130 


Gramineae 


RCS2 


2 


639 


41 


Tandem 


6,200# 


Oryza 


RCH1 


3 


827 


45 


Dispersed 


53 


Gramineae 


RCH2 


4 


1.201 


46 


Dispersed 


99 


Gramineae 


RCH3 


5 


1,341 


48 


Dispersed 


67 


Gramineae 


RCE1 


6 


701 


39 


Dispersed 


287 


Bambusoideae 


RCE2 


7 


2,863 


41 


Dispersed 


305 


Gramineae 



♦Based on the haploid genome of rice as 424 Mb (24). 

#The copy number of the 168-bp monomer in the rice genome. 



Clone pRCSl contains a 877-bp Saul A.I fragment that hybridizes to the pSau3A9 
sequence. Sequencing analysis revealed that the 259 bp at the 3' end of pRCSl had 80% 
sequence identity to the central part (bases 338-602) of the pSau3A9 sequence (see Jiang, J., et 
al., Proc. NatL Acad. Sci. USA 93: 14210-14213 (1996), FIG. 1 and SEQ ID NO:8). The first 95 
bp in pRCSl had 76% sequence identity to a Jyll gypsy class of retrotransposon sequence 
reported in maize (GenBank Accession No. AF030633). Nucleotides 171-228 of pRCSl had 
70% sequence identity to a Tyl/gypsy class of retrotransposon sequence reported in Lilium 
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henryi (XI 3886). It was also discovered that the pSau3A9 sequence in sorghum has similar 
sequence identities to the Ty3/gypsy class of retrotransposons. These results indicated that both 
pSau3A9 and pRCSl probably were derived from retrotransposon-related DNA sequences. 

The RCS1 sequence was located in the centromeric regions of all 24 rice chromosomes 
by FISH (see FIG. 2A). The sizes and intensities of the FISH signals were uniform on different 
chromosomes, suggesting that all rice chromosomes contain a similar number of copies of this 
element. Slot blot analysis suggested that ther are about 130 copies of RCS1 present in the 
haploid genome of japonica rice DV85 (see Table 2). 

Rice genomic DNA was digested with several restriction enzymes and probed with the 
259-bp fragment conserved between rice and sorghum. One or few major bands and several 
minor bands were detected in most of the lanes (see FIG. 3). Fiber-FISH using pRCSl as a probe 
did not generate clustered signals. These results suggested that the RCS 1 sequence is dispersed 
in the centromeric regions of rice chromosomes. 

FISH analysis revealed that pRCSl also hybridized exclusively to centromeric regions of 
chromosomes from other Gramineae species (see FIG. 2B-E). The FISH results on rye (see FIG. 
25) and barley (see FIG. 2Q chromosomes showed that hybridization was exclusive to the 
primary constrictions. FISH signals also were detected in the centromeres of the supernumerary 
B chromosomes from both rye and maize (see FIG. 2 B and E). Positive Southern hybridization 
signals were detected in all other Gramineae species analyzed, including bamboo. Pharus sp., 
oats, wheat, sugar cane, Ae. squarrosa, and Ag. intermedium. However, homologous sequences 
could not be detected by Southern hybridization analysis in dicot species and any monocot 
species outside of Gramineae, indicating that the RCS1 family is sufficiently conserved only in 
the grass family Gramineae. 

Clone pRCS2 contains a 639-bp Saui AI fragment consisting of four copies of a tandemly 
arranged repeat with a consensus sequence of 168bp (see FIG. 4). The four copies were 84-91% 
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identical with one another. The third copy of the repeat contains a 6-bp insertion (TTGGCC) at 
base 147. A search of the GenBank database found a highly significant match to a repetitive 
DNA element isolated from 0. Sativa (GenBank Accession No. U63977). 

Southern hybridization analysis of rice genomic DNA using probe pRCS2 revealed ladder 
patterns using several restriction enzymes, including Dpnll &U/3AI, Mspl HpalL and HaellL 
indicating that the RCS2 family is tandemly arranged in the rice genome (see FIG. 5). Several 
enzymes produced digestion profiles comprised of monomer and multiples (dimer, trimer, 
tetramer, etc.) of the 168-bp basic repeat. 

Probe pRCS2 hybridized only to the centromeric regions on all rice chromosomes (see 
FIG. 2K). Significant variation in the size and intensity of the FISH signals was detected in 
different centromeres. Two pairs of chromosomes had strong signals, and a third pair had very 
faint signals (see FIG. 2K). All of the signals became weaker as the posthybridization washing 
stringency was increased (see FIG. 21). However, even after washing in 70% formamide at 
50°C for 15 minutes, most signals were still discernible (see FIG. 2M). suggesting that the signal 
disparity reflects difference in copy numbers rather than sequence divergence of the RCS2 family 
in different rice centromeres. Though the longest chromosome (chromosome 1 ) had the 
strongest signals, it was not possible to relate the copy numbers to the chromosome sizes. It was 
evident that the weakest signals were not on the smallest chromosomes (see FIG. 2K). 

Three subspecies of O. sativa (A A genome), together with O. glaberrima (AA), O. 
rufipogon (AA), O. aha (CCDD), and O. officinalis (CC) were included for FISH analysis. FISH 
signals were detected in the centromeric regions from all of the chromosomes of these species. 
Southern hybridization analysis revealed that the RCS2 family is present only in the species 
within genus Oryza. Homologous sequences could not be detected even at a low stringency in 
any plant species outside of genus Orya. 
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RCS2 is the most abundant element isolated from BAC 17p22 and has about 1,550 
copies, corresponding to 6,200 monomers, in the haploid genome of DV85 (see Table 2 above). 
BAC 17p22 contains about 46 copies of this element, corresponding to approximately 39% of the 
BAC insert. Fiber-FISH analysis demonstrated that the RCS2 family is organized into various 
sizes of uninterrupted arrays in the rice genome. The longest observed block with small 
interspersed gaps (<2 i*m) was 5 1 pan (see FIG. 2N). Based on a 2.96-kb/^m resolution of the 
Fiber-FISH technique, this block represents approximately 151 kb of uninterrupted RCS2 
sequences. The longest observed single Fiber-FISH signal with interspersed gaps larger than 2 
/ztn was 188 /urn representing approximately 556 kb of centromeric DNA sequences. 

The other five centromeric DNA elements isolated from rice BAC clone 17p22 were 
analyzed by FISH, and all of them hybridized exclusively to the centromeric regions of all rice 
chromosomes (see FIG. 2F-J). One or two pairs of rice metaphase chromosomes showed weak 
hybridization when pRCH2, pRCH3, and pRCE2 were used as probed. No relationship can be 
confirmed between signal intensities and the sizes of the chromosomes. 

The sequence information for these families is listed in Table 2, above. Searching in the 
GenBank database did not uncover any significant matches to these sequences except for 
pRCH2. Bases 39-102 and 204-232 in pRCH2 had sequence identities to the centromeric CCS1 
sequence isolated from B. sylvaticum (see Aragon-Alcaide, L., et al., Chromosoma 105, 261-268 
(1996); Abbo, S., et aL, Chromosome Res. 3:5-15 (1995)). Interestingly, about 120 bp (bases 8- 
130) of this element had 80% sequence identity to the spacer sequence that separates the rice 5S 
rRNA genes. The possibility that this element associates with the 5S rDNA locus was excluded 
because the FISH signals from pRCH2 was located proximal to those from the 5S rDNA locus. 

In Southern hybridization analysis, all five elements produced one or few major bands 
and several minor bands under several restriction enzymes, similar to the RC SI family (see FIG. 
3), suggesting that they all are dispersed in the rice centromeric regions. The copy numbers of 
these elements ranged from 53 to 305 copies per haploid rice genome (see Table 2, above). 
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All five elements were hybridized to various plant species by Southern hybridization. 
The RCE1 family was present only in the species from the Bambusoideae subfamily, including 
rice, bamboo, and Pharus sp. (See FIG. 6B) : whereas RCHK RCH2, RCH3, and RCE2 all were 
conserved across the Gramineae species (see FIG. 6A for RCH 1). Species from subfamily 
Panicoideae and Bambusoideae generally had stronger hybridization signals than those from 
subfamily Pooideae (see FIG. 6A). 

The cytosine nucleotides, especially those in dinucleotide sequence 5'CpG3\ are the most 
common sites for methylation in plant genomes. Methylation occurs at lower frequencies when 
the C and G are separated by 1-2 A/T nucleotides (see Gruenbaum, T., et al. Nature (London) 
292: 860-862 (1981)). Enzymes Mspl and HpaU ar isoschizomers that recognize the 5'CCGG3* 
sequence. Neither enzyme can cut when the 5'C is methylated, and only Mspl can cleave when 
the internal cytosine is methylated. Though both enzymes produced similar digestion profiles of 
rice genomic DNA, Mspl generated much smaller-sized hybridization bands from all of the rice 
centromeric DNA probes than Hpall did (see FIG. 3 for RCS 1 and FIG. 5 for RCS2). For the 
RCS2 element, monomers of the 168-bp basic repeat could be found in Mspl lane, and most of 
the hybridization was in the fragments smaller than 2 kb, whereas the majority of hybridization in 
the Hpall lane was larger than 2 kb (see FIG. 5). For the other centromeric elements, DNA 
fragments smaller than 5 kb were not detected in Hpall lanes (see FIG. 3 for RCS1). These 
results suggest that the cytosine of the CpG dinucleotides are heavily methylated in the rice 
centromeric DNA sequences. Restriction enzyme Sail recognizes 5'GTCGAC3* and is sentsitive 
to the methylation of CpG dinucleotides. Small fragments (<10 kb) that hybridized to the 
centromeric elements were not detected in the Sail lanes (see FIGS. 3 and 5). 
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1 . An isolated and purified nucleic acid comprising a nucleotide sequence of SEQ ID 
NO:7. 

2. A recombinant DNA construct comprising a centromere, wherein said centromere 
comprises a number of highly repetitive regions of DNA having a nucleotide 
sequence of SEQ ID NO:7. 

3. The recombinant DNA construct of claim 2, further comprising a yeast autonomous 
replication sequence. 

4. The recombinant DNA construct of claim 2, further comprising an autonomous 
replication sequence from a higher eukaryotic organism. 

5. The recombinant DNA construct of claim 2, further comprising a yeast telomere. 

6. The recombinant DNA construct of claim 2, further comprising a telomere from a 
higher eukaryotic organism. 

7. The recombinant DNA construct of claim 2 5 further comprising a selectable marker 
gene. 

8. A plasmid comprising the DNA construct of claim 2. 

9. The plasmid of claim 8, wherein said plasmid further comprises an origin of 
replication and a selectable marker that functions in bacteria. 

10. The plasmid of claim 9, wherein said bacteria is E. coli. 

1 1 . The plasmid of claim 8, wherein said plasmid further comprises an origin of 
replication and a selectable marker that functions in yeast. 
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12. The plasmid of claim 8, wherein said yeast is S. cerevisiae. 



13. A plant artificial chromosome vector comprising an autonomous replication 
sequence, two telomere sequences, a centromere sequence having the nucleotide 
sequence of SEQ ID NO:l, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID 
NO:5, SEQ ID NO:6 and SEQ ID NO:7 or combinations thereof, and at least one 
selectable marker sequence. 

14. The plant artificial chromosome vector of claim 13, wherein the autonomous 
replication sequence if from yeast. 

15. The plant artificial chromosome vector of claim 13, wherein the autonomous 
replication sequence is from a higher eukaryotic organism. 

16. The plant artificial chromosome vector of claim 13, wherein the telomere sequences 
are from a higher eukaryotic organism. 

17. The plant artificial chromosome of claim 16, wherein the telomere sequences are 
from Arabieopsis thaliana. 

18. The plant artificial chromosome vector of claim 13, wherein the telomere sequences 
are from yeast. 

19. A plant cell transformed with the plant artificial chromosome vector of claim 13. 

20. The transformed plant cell of claim 19, wherein the plant cell is from Oryza saliva. 

21. A transgenic plant comprising the transformed plant cell of claim 19. 

22. A method of identifying centromeric DNA in a higher eukaryotic organism, the 
method comprising the steps of: 
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hybridizing an isolated nucleic acid selected from the group consisting of SEQ 
ID NO:l, SEQ ID NO:2, SEQ 1D:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID 
NO:6, SEQ ID NO:7 and combinations thereof, with a sample of DNA from a 
higher eukaryotic organism; and identifying and isolating centromeric DNA 
from said sample. 
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FIG.1 



10 20 30 40 50 
1 1 I I L_ 

GATCTTTGGA TTGGAAACAG TTAAAGAACA ATATGTGCAT GATGATGATT 

60 70 80 90 100 
1 1 I I L_ 

TTAAAGATGT GTTTTTGCAT TGTAAGGATG GGAAGGCATG GAATAAATTT 

110 120 130 140 150 
1 1 I I l_ 

GTTGTAAATG ATGGTTTTGT GTTTAGAGCT AATAAGCTAT GCATTCCAGC 

160 170 180 190 200 

1 1 I I L_ 

TAGCTCTGTT CGTTTGTTGT TGCTACAGGA AGCACATGGA GGTGGTTTGA 

210 220 230 240 250 
1 I I 1 I 

TGGGACATTT TGGGGCAAAG AAGACGGAGG ACATACTGGC TGGTCATTTC 

2£0 2^0 2^0 2jK) 3(^0 

TTTTGGCCAA AGATGAGGAG AGATGTGGAG AGATTTATTG CTCGCTGCAC 

310 320 330 340 ■ 350 
1 1 1 I l_ 

GACATGTCAA AAGGCCAAGT CACGCTTAAA TCCACACGAT TTGAAGCCAT 

360 370 380 390 400 
1 1 1 I l_ 

ATTTGGGTGA GGGAGATGAG CTTGAGTCGG GGACGACTCA AATGCAAGAA 

410 420 430 440 450 
1 1 I ; 1 !_ 

GGGGAGGATG ATGAGGACAT CAGCACCATC TATACATCCA CACCTACACC 

460 470 480 490 500 

1 1 1 : I 1_ 

CACACCATCG CCAACACCAC TTGGCCCTCT TACTCGTGGC AGTGCCCGTC 

510 520 530 540 550 

! L I I |_ 

AACTGAACCA TCAAGTAAGT TTATTCTTAA ACTCTTGTCC ATCATATTTA 

560 570 580 590 600 
I I i | ! 

GACAATGGAG ACACGTGCAC TCTTGTTTTG CTTAGGAATG ATGGAGAGGA 

610 620 630 640 650 
1 1 1 I !_ 

CCAGAAGCAT AGGGGATTGG TGTAGGCTGG ATTTGGACAG CAAGACAGCA 

660 670 680 690 700 

1 1 1 1 . l_ 

CCAACTTACA ACAACCGCCA TGACTTCATA CAGAGTCCAT TTTAAGCATG 

710 720 730 740 750 

I I I | ! 

CAAGCACTTG ATGGAAAACT CGTCAAGTAT ATTTTTAGAT GGATC 
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Fig. 2 
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Fig. 3 
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FIG. 4 

I 50 

A GATCTT-TTCTACT -GGAATCAAA ATGTTCAAAA AATGCCAAAA CATGATTTTT 

B TT-TTCTACT -GGAATCAAA ATGTTCAAAA AGAGCCAAAA CATGATTTTT 

C TTATTCTACT -GGAATCAAA ATGTTCAAAA AGAGCCAAAA CATGATTTTT 

D TTA-T-TAAT CGGAATCAAA ATGTTCAAAA GGCACCAAAA CATGATTTTT 

F TTatTcTAcT - GGAAT c AAA AaGTTCAAAA agagCCAAAA CATGATTTTT 

51 100 
A GGACATATTG GAGTGTATTG GGTGCATTCA TGGCAAAAAC TCACTCCGTG 

E GGACATATTG GAGTGTATTG GGTGCGTTCA TGGC AAAA- C TCACTTCGTG 

C GGACATATTG GAGTGTATTG GGTGCGTTCG TGGC AAAA -C TCACTTCGTG 

D TGACATATTG GAGTGTATTG GGTGCGTTCG TGGCAAAAAC TCACTTCGTG 

F g GAC ATATTG GAGTGTATTG GGTGCgTTCg TGGcAAAAaC TCACTtCGTG 

101 150 

A ATTCGCGCGG CGAACTTTTG TCAATTAATG CCAATAT-TG GG-ACA 

3 ATTCGCGCGG CGAACTTTTG TCATTTAATG CCAATATGTG CATACA 

C ATTCGCGCGG CGAACTTTTG TCAATTAATT CCAATATGTG CATATTTTGG 

D ATTCGCGCGG CGAACTTTTG TCATTTAATG CCAATAT-TG GC -ACA G 

F ATTCGCGCGG CGAACTTTTG TCA*TTAATg CCAATATgTG *atAca g 

151 168 
A --CGAG-G-G T-GCGATG (15 5 bp) 

E - -CG AG AG AG T-GCGATG {15 8' bp ) 

C CCCAAA-GTG TTGCGATG (165 bp ( 

D --CGA-CGGG T-GCGATC (157 bp) 

F — CgAg*G*G T GCGATg 
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Fig. 5 
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Fig. 6 
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FIG. 13 

GAATTCCATCTCGTGTGGCCGGAATCACAGGCACAAGGTGCTGCGTCACCTATA 

ATGGGCCTATGGGCTTGTAACTTCATTTGGGACCCCGGCCCAGGTGGGGCCCA 

TGTGGGGTGCGCCCCAACCAGGTGGAGCACGACCCTAGGCACCCCTAGGTCGT 

CCTCCCACCCCTATATATAGCTAGGTACCCCTTCAGGGTTTCTTGGGTTTTGATA 

GATTAAAGTTTAGCCATTGCTACTTGCTTGCAGCGCGCGTGTCGCCTAGACCGT 

CCGTCTGCTTGTTCTTCGGAACCCCAACTTATCATTTGTATTAAATTCCTATTTG 

CAATATCAGATTGCTTTTATCTTGTTCTTGCTTGTTTCTTCGATTTGCTTGCAGG 

AATAGGGTTGATCTGCACCGGTAAGATCAACAACCCACGGAGAGGTGTATCGAT 

CGCTAAGGCGCAACACAACGTCTCGTACGGTTGTAGTCGGATCGTCAACGTTTC 

TCC C AAATCGTAGTTATC ACAACTC AC CG AAAGATCGGGC C AAAC AACTGCCTT 

GAGTGTCGAGAGGAACTCAGGGTTCATCAGGTGGTATCAGAGCTTTCGTTGCTC 

GGTGAGTTTTTATCTTCCTATAACCAGAAAATAGCCATACAAAAAAAATTCGTAT 

CATTTACCTAGCCATATTGTGCCATTGCATTGTTCTTTTCAGTTTTTGCTTTGTT 

GAATTTTGTGTTGCATCGTCACGTCGTGTTGCTGGTCTTAGCGTCTAGTTCGTTT 

AGAGTTTCGAGTTCTGGTCACGTTTTGTACCACAGACGTCGCTGCCACCGTGTC 

TCTGTTGTTGTCGGGACCTACGAAAACGGAATCAGCTCGCTGCCACGGTTCTAT 

TTTTGTAGTTTTCAGAAGTTTCTGTCGGTTAGTTTTGTAGTTCTTGAGTTGCATC 

TAGAGTTTGCCGATCCTTGTGTGGGTTTGTTTTGGCCATGCCTCTGTCGTGCAC 

GAGAGGAGGAAGGTGAATTACCATATCTGATTTTGGAAGTAGTCAAGTTTGTTT 

TGGAAAGATATAGTAGATTGGATTGGTTAAAAATCAGTTTCCTTTTTATCCCATA 

CACCAAATTCGGCTGCCATCCACTCCACCTCCTGGCCGAGTCCCACTCGCCCCT 

TTGGCCGAGTCCGACTCCCTCCCTCTCTTCCACGATTCCGAGTTGTGTCAAACA 

CCTTGGCGAATTTTTGTTTGGTGTCGGTTTCGAGATCTGTTTGGAAAAACGGAA 

CCGGCATAAATTCAGCATTCCATTTTTTGTATAATTTTAATTTTGGGTTTAGACT 

TTTACATTTGAGTCCCTGTAAATTTTATATTTATGTTTGAGTCCCTGTAATCTTA 

CATTAAGGTCCTTGAGTCAGTTTTTGTTTAAAAAAATCAAGAAAAAAAAGTGAG 

AAAAAAAAAGGCAGAAAGGTGCAAAAAAAAAGAGAAAAAAAAAGCCGCACAAA 

AAAAACAGAAAAGAAAAGGAAAGAAAAAAAAAAAGAAAAGAAGAAAGAAGAAA 

GAAAGAAAAGAGAAAATACTGTTATGTTTGAGCTGAACTTCATATATCAGACTT 

GTGCATGAGTTGTTCCTAGTGCTATCTTGTGGTATCGTTTGTGTCTAGGCTCGC 

GTCTCTAGTACGTTCTAGCCTAGGACCAGCACGGTACTTGACTTTGAACAATTA 

TTCAACTTTGCATTATCTGATTTGAGCATTTGCTATTCCTTTGCTACATATTTAA 

GCCTACCCAGAGCTCCACATATTTGATTACAGCCGTACCGCAAGTGTTTGCCAA 

GGCATCGATACATCCAACCTTCATTGGGGTGCTTGGTTGAGTCGGTGTATGTCA 

CCATTCCACTTGCATTGGTAAGATCTTGTAAGAGCTTGGTTAAAAGCTTGAGTG 

TGTGCGATTTTTTGACCTGCCACTACCTAGTAGTTAATAGGAACGCGCATATTTT 

TGTGTATGTTTCCTGTTTTCTACTAACAATGGCAGGGATACGCAAGATAATTGG 

GGATAGCTGTGCTCAACATCGACATCTTCGTCGAGACATGAGGAGGGATCAACA 

TGACCATTATGAGGTAAGTGATGATGTTCTAGGTAAGATCAAATCTGCACTGCC 

TTATTTCGAGGGAAACTATGATCCTCGTGCTTACATTAATTGGGAGCTAGCGGT 

TGATAGTGAATTTCAAAAGCATGTCTTGTCGGAGAAACAAAAGGTTATGTGTGC 

CTCTAGTGTTTTAATTAAACATGCTTCTAATGATTGGAAACATCTTTGTAGGCAT 

AACAAAATACCACAATCTTGGAAAGACCTGAAACGATATTTCAGAGATGTTTAT 

GTTCCCATGTATTATGCTGATATTCTGCTCAACAAACTGCAATGTTTAAAACAAG 

ATACCAAAAGTGTTACTTCATACTATCATGATATGCATGCTTGTTTATTACGTTG 

TGGCTTAGATGAATGTGAAGAAGCTACAGAATTGAGGTTTTTACGTGGACTTAA 

CAAAGAAATTCAGGACATGCTTGCTTGTGAAAAGTATAGATCTCTTTCTCATTTG 

TTACAACTTGCTTGCAATGCTGAAAGTAAAATAGAGGAGGATATGAAAAAGAAA 

CACGCTATGTCTTTGCCTCCAATTACTAACTATTTGCAGGAAGTGCGTAATCAT 

GAAAAGGAGGAGAGAGACATGAAAGAGCCACCAATTCCATTGTTCACACTCAA 

G TT C GAG AC ACC TCC ATCAT C TAAAGAGG AC ATC AAAGG T.AAAG TAAA TGGTA C 

TG AAATTAATC AAGG TGAGTG C ATTG TTAACG AAGTAAATTTG TTC AC TTTTC AT 

3 C AAAAG TAG AG C AAC C ATTA G TGG AAC C AAATG C TGG AATTC 
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SEQUENCE LISTING 



(1) GENERAL INFORMATI ON : 

(i) APPLICANT: Jiang, Jiming 
Dong , Fenggao 

(ii) TITLE OF INVENTION: DNA Sequences Specific to Rice 
Centromeres 

(iii) NUMBER OF SEQUENCES: 8 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Rockey, Milnaraov; & Katz, Ltd. 

(B) STREET: 180 N. Stetson Avenue, 2 Prudential Plaza, 

Suite 4700 

(C) CITY: Chicago 

(D) STATE: Illinois 

(E) COUNTRY: U.S.A. 

(F) ZIP: 60601 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 
<B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC- DOS /MS -DOS 

(D) SOFTWARE: Patent In Release #1.0, Version #1.3 0 

(Vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Mueller, Lisa V. 

(B) REGISTRATION NUMBER: 38,978 

(C) REFERENCE/DOCKET NUMBER: WARF 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 312-616-5400 

(B) TELEFAX: 312-616-5460 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1478 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(vi) ORIGINAL SOURCE : 
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(A) ORGANISM : Oryza sativa subsp. indica 



PCT/US00/17535 



(xi) SEQUENCE DESCRIPTION : SEQ ID XO:l: 

TCATCAATGA TGGGTTTGTT TTCAGAGCTA ACAAGCTATG CATTC CAGCT AGCTCCGTTC 6 0 

GCTTGTTGTT GTTGCAGGAA GCGCATGGAG GCGATTTGAT GGGACATTTT GGTGCCAAGA 12 0 

AGACACATGA CATCCTTGCT AGTCATTTCT TTTGGCCACA GATGCGAAGA GATGTTGGCA 180 

GGTTCGTTGC TCGCTACGCT ACATGTCAAA AGGCTAAGTC ACGCTTACAT CCACATGGTT 240 

TGTATATGCC TCTTCCTGTT CCTACTGTTC CTTGGGAAGA TATTTCAATG GATTTTGTGT 3 00 

TAGGATTGCC TAGGACCAAG AGGGGGCGTG ATAGCATTTT TGTGGTTGTG GATCGAITTI* 3 60 

CTAAAATGGC ACATTTCATA CCATGTCATA AAACTGATGA TGCTTCTCAT ATCGCTGATT 420 

TGTTCTTTCG AGAAATTGTT CGCTTGCATG GTGTGCCAAA CACAATTGTT TCTGATCGTG 480 

ACACAAAATT TCTTAGCCAT TTTTGGAGAA CTTTGTGGGC TAAATTGGGG ACTAAACTTT 54 0 

TGTTTTCTAC TACTTGTCAT CCCCAAACTG ATGGACAAAC TGAAGTGGTG AATAGAACCT 600 

TGTCTACTAT GCTTAGGGCT GTTTGAAGAA AAATATCAAG ATGTGGGAAG AATGCTTGCC 660 

TCATATTGAA TTTGCTTATA ATCGTTCCTT GCATTCTACT ACAAAAAATG TGCCCATTTC 720 

AGATTGTGTA TGGTTTGTTG CCTCGTGCTC CAATTGATTT GATGCCTTTA CCATCTTCTG 780 

AGAAACTGAA TTTTGATGCG AAGCAACGTG CTGAGTTGAT GTTAAAACTG CATGAGACAA 840 

CTAAAGAAAA CATAGAGCGC ATGAATGCTA AGTATAAGTT TGCTGGTGAC AAAGGTAGAA 900 

GGGAATTGAA TTTTGAACCT GGAGATTTGG TTTGGTTGCA TTTGCGAAAA GAACGATTTC 960 

CTGATTTGAG AAAGTCTAAA TAGATGCCTA GAGCTGATGG ACCATTTAAA GTGTTAGCAA 1020 

AGATTAATGA GAATGCATAT AAGATTGATT TGCTTGCAGA TTTCGGGGTT AGTCCCACAT 1080 

TTAACATTGC AGATTTGAAG CCGTATATTG GGAGAAAAAG ATGAGCTTGA GTCGAGGATG 1140 

ACTCAAATGC AAGATGGGGA GGATGATGAG GACATCAACA CCATCGATAC ATCCACGTCC 1200 

CCCCATATAC AGCATGATGG TCCTATTACC CGCGCTTGTG CACGTCAACT AAATTATCAG 1260 

GTGATTCTTT CTTGAGTTCA AATTTCCTCG TCTTTATACC TCGGAGACGC GTGCACTCGT 1320 

GTTTTACTCC AGGAACGTAT GGAGAGGATC AAAGGGAAGA GGATTCGCGC GGGGTGGATT 1380 

CGGACTGCAG GGCAGCGCCA ACTTCTGACG GCCGCCACGA CTTCATGCAA ACTCCGATTT 1440 
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GGGCGTGCAA GTACTTCATG GAAAGCTTAT CAAGTCTA 1478 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 639 base pairs 
(8) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM; Oryza sativa subsp. indica 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

GATCTTTTCT ACTGGAATCA AAATGTTCAA AAAATGCCAA AACATGATTT TTGGACATAT 60 

TGGAGTGTAT TGGGTGCATT CATGGCAAAA ACTCACTCCG TGATTCGCGC GGCGAACTTT 120 

TGTCAATTAA TGCCAATATT GGGACACGAG GGTGCGATGT TTTCTACTGG AATCAAAAAG 180 

TTCAAAAAGA GCCAAAACAT GATTTTTGGA CATATTGGAG TGTATTGGGG TGCGTTCGTG 24 0 

GCAAAACTCA CTTCGTGATT CGCGCGGCGA ACTTTTGTCA TTTAATGCCA ATATGTGCAT 3 00 

ACACGAGAGA GTGCGATGTT ATTCTACTGG AATCAAAAAG TTCAAAAAGA GCCAAAACAT 360 

GATTTTTGGA CATATTGGAG TGTATTGGGT GCGTTCGTGG CAAAACTCAC TTCGTGATTC 420 

GCGCGGCGAA CTTTTGTCAA TTAATTCCAA TATGTGCATA TTTTGGCCCA AAGTGTTGCG 480 

ATGTTATTAA TCGGAATGAA AAAGTTCAAA AGGCACCAAA ACATGATTTT TTGACATATT 54 0 

GGAGTGTATT GGGTGCGTTC GTGGGAAAAA CTCACTTCGT GATTCGCGCG GCGAACTTTT 600 

GTCATTTAAT GCCAATATTG GCACAGCGAC GGGTGCGAT 63 9 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 827 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Oryza sativa subsp. indica 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

AAGCTTCATA CTTTGTCCGT TTGAAATAAG ATTTTCTGTG GTTGGTTCTA GTGTGCTCCT 60 

GGTTGTGAAA AAAAATATCA AAAAAATAAT AAAA7VAATAA AAAAAAAGTT GTGCCTCTCT 120 

CCTCAGCACT AGAGAGAGCA AGCAAAAAAA AAAAAAGAGG ATCCAGCGAG CACACCACAC 18 0 

CATTCCCCCT TTGTCACCAC CTTCCACCAC CGGTTCTTGT TTTCGTTGCT GTTTGGGTTT 240 

GTGTTTTGTG GCGTCCTTTG TGGTTAGACT CGTCGTCTCT ACTTCGACTA GCCTAGGACC 3 00 

AGCTACTGTA CACTATACCC TCTGAGCGAT TATTCAATTT GCTTTGGCTA ACGTGGTTTC 360 

TAATTCTTCT CTTGAGCAAT GCTTTCAGCC CACCAACAGC TCCACCACAA ACCCGACTAC 420 

AGCTTGACAG GTCTTGTTGC TACAGCAACG ATACACCTCG TTCCACGGTA GCTAACTAGT 480 

GGTGTTGCTG TCCCCTACCT GTGGGCAAGG TAAGAACTGG TAAGAGCTTG TGAGACAAGC 540 

TGCGAGTGAA GTGAGAGTGA GCGTCTTGCA AAAGCTACAT CCCAATAGTT GTGTAGGGCT 600 

TACATTACAT CCTCCATTGT TTTGTTGTCC TTCATTTCTA AACCATGGCA GGTCCA3AGG 660 

ATAAAGATGA GTAGGGTGCA TCTCATTCCC CACGCACCAA AGGCATCATC CAATACTTTA 720 

CAAGGCAAGT GAAGCAGCAC ACAGAAGGAC TTGATACAGA TTTGCAGGTG ACAAATGAGA 780 

AGATTGGACA ATTGGAGTCC ACGCAGATCT CCACCAACAC TAAGCTT 827 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1201 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM :• Oryza sativa subsp. indica 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

AAGCTTCTCG CCACAAAAAC AGAATTTTGC AAGAGGAATT TGGCGATTTT AAATTTGCCG 60 

AACCCAGCAA AGCTGGGGAG AAACGGATGT AACTTTTTTT GTAGATGTCC GTGGATATAT 120 

TATAAGCCTA ATTTCGAGTC CGTATGAAAA AGATATGACT GTTTTAAGGA AGGTCACTCG 180 
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AATCAGGGTT CTATTTCCTT TCTAGTGGCG GCAGGATACA AGGCAACTCA CAAAAGCTAT 24 0 

CGGCAACAGA TCAAACACAA ATAGAAAGTG ATTTGGTGGG TGACTGCGAC GGTGGTGACG 300 

ATGGTGATGA CGGCAAAATT GATGAAGTGG CGATGTGCGG TGGTGGCAAG AAC7CGAAAC 360 

TCTAATCGCC TAACACTAAG ACCAGCAAAT TAATAAAACC GATGCAATCG AAAACTCAAC 420 

AAGCCTTAAC TAAGCAGTAC TAATGAAGAG GCACAATGGT ATATATGATC ACCGAAAAAC 480 

TAATCTACTA TTTTTTTGTT GTTGTCTTTT CTGGACTATA GGAAAAAAAA ATAATGACGA 54 0 

AGGGAAGGGT AAATTCTCTC ACCGATGAAC CACGCTCTGC TACCACCTGA TGCGAACCCG 6 00 

CTAGGGTTTG TGGCCCGATC TTTCGATGAG AGGTAAGGGA TAACTCGATT AGATGAAGTC 660 

GACGTTCACG GCCCCACTAC AACTGTCCAA AGACGCTGTG CCTTACCAAC CGATACACCG 720 

TCTCCAATGG TCGCCGCGAA CTTGTGGTGC GCGTCAACCC GGCCACGAGG GCACTCGTCC 78 0 

TGCAAGCAAT CGAAGAACAA GCAAGAACAA GTAGGACAAG CACTGAAATT GCTAGATAAA 84 0 

GATGAAAGTT TCACTATCAA ACACAATATG GTGGGGTTCC GATTACAGGC AGACGCGGCG 900 

GTTTAGCCAA CACGCGCGCT GCGAGCCGGT AGCAAGAAGC TATCTTCTAT CATCAAAACC 960 

CGCCTGTTTT TGGCGGCGAC TAGAGGTATA ACACAAGAGG GGAAGAACGA CCATAGGGTC 1020 

GTGCTCCAAC CCTAGGACGC GCCCCTAATG GGCCCAACAT GGATAC A C AG CCCATTGGGC 108 0 

CAAAAGAGGT GACGCAGCAC CATGGACAGA AAATAGTCGG GAGTAAAATG ACAATTGCGG 114 0 

CCGCTCCAGA ACAGATATGG ACATGTGGCT GGATCCACTT GAAAGTAGAC TTGATAAGCT 1200 

T 1201 
(2) INFORMATION FOR SEQ ID NO; 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1341 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Oryza sativa subsp. indica 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
AAGCTTGTCG CCCGCGGCCA CTTTAGNCTT GCTGCTCTTG TGACGCATTT TGATGCTCTC 60 
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AATGCTGGAG GCAATGGTGG GGGTAATGAT GATGACATTG ACGGAGAGTA CGTCGAGGAT 120 

AATTTGGAGG ATGAATATAT TGCTGACACT GAACAAGATG ATCGAGATGC TCGAGATCGT 180 

CGACGGCTAC ACAACAATCG ACGAGGTATG GGTGGTCGCC GCCGACGCGA GGTACGCAAC 240 

AATGATGATG CTTTTTCTAA AATTAAATTT AAGATT CCTC CTTTTGATGG AAAATATGAT 300 

CCTGATGCGT ACCTCTCTTG GGAGATTGCT GTTGACCAAA AATTTGCATG CCATGAGTTT 360 

CCTGAGAGCA CTAGAGTTAG AGCTGCTACG AGTGAGTTCA CCGATTTTGC TTTGGTTTGG 420 

TGGATAGAAC ATGGCAAGAA AAATCCTAAT AACATGCCAC AAACTTGGGA TGCTTTGAAA 480 

CGGGTCATGC GGGCTAGATT TGTTCCTTCT TACTATGCAC GCGACTTGCT GAATAGGTTG 540 

CAACAATTGA GACAGGGTGC GAAAAGTGTA GAGGAATATT ATCAGGAGTT ACAAATGGGC 600 

TTGCTTCGTT GTAATTTAGA GGAAACTGAG GACGCTGCCA TGGCTAGATT TTTGGGTGGG 660 

TTAAACCGCG AGATTTATGA CATCGTAGAC TATAAAGATT ACGCTAATAT GACCCGATTG 720 

TTTCATCTTG CTTGTAAGGC TGAAAGGGAA GTGCAAGGAC GACGTGCTAG TGCCAAGGCT 780 

AATTTTTCTG CAGGTAAAAC TTCATCATGG CAGACACGCA CCACTCCTCC GGCCGGCCGT 840 

ACTGCTTCTC CATCTTCCAC ACCCACAACC AGTCGAGCAG CACCTCCTCC ATCTAGTGAC 900 

AAGTCAGCGA CAAAGGCTGC TCAGCCAGCA CCGAGTGCTT CTTCAATGGC AT CC AC AGGC 960 

CGAATGAGAG ATGTTCAGTG CCACCGTTGC AAGGGCTTTG GGCATGTGCA GCGTGACTGC 1020 

CCTAGCAAGC GAGTTTTGGT AGTCAAAAAC GATGGTGAGT ACTCCTCTGC TAGTGATTTC 1080 

GATGATGATA CACTTGCTTT GCTTGCGGCT GACCATGCAG ATAATGAGCC ACCGGAAGAG 1140 

CACATTGGGG CTGCATTTGC GGATCACTAT GAGAGCCTCA TTGTGCAGCG TGTCCTTAGC 1200 

GCACAAATGG AGAAGGCGGA GCAAAATCAG CGACACACGT TGTTCCAAAC AAAGTGTGTC 1260 

GTCAAAGAGC GTTGTTGCCG CATGATCATT GATGGAC-GTA GCTGCAACAA CTTGGCTAGC 1320 

AGCGAGATGG TGGAGAAGCT T 1341 
(2} INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 701 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Oryza sativa subsp . indica 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

GAATTCCTTT GTCACAAGTT GATTTACTCG CTGTTCCTTG TGATAAAGAA GAGTTGTGTG 60 

ATAATGCTTC ACTTAAATCC ATGCCACAAC TAGTGAATGA ACATGCTATT CCTAGTGTAT 12 0 

CTTTCTGTGC TGATTTTAAG CAT GTTGTTC ACATTGCTAA TAGAGTAGAG GAACGTGAAT 180 

TGACATCTTC TTTAAATACT TTGGGCTATG TTCAGTTTGC TGATTTTCAT GAGCTCGATA 240 

ATTTGAAGGA GAAATTATTT CCTAAGTCTG ATTTGCCATG TCCAAGTAAC GCTATTTTTC 300 

ATCTCTTTGG TGAATATAAT GATAGAGGAA TATATTTGGT GCATAGAGTT TACATCTGTT 360 

CAGATTTAGA ACCTCCTGTA CATGTGGATA AAACATGCAA GCTAGAGAGA AATGTTATTG 420 

CTAACAAAAT TGTCTCGAGT TTGTCTTGTT TTGATTGGAC GAAACAGGTT GTTGTAGAAT 480 

GGTACTACGC AATAGAGCAC CACTATGGAG ATAATACCGA GGACGGTTTT CCATTGAAGA 540 

TACGGGGAGG TATGATGTGA CCATGGCTAC ACACGGATGC AACTCATGTG ACTCACATGC 600 

ATAGGATGAA CAGAGAAGAT ATCGAGATCA CATCGTCCAA GTGCTGGAAT CCGATTCGGC 660 

CACACTACTA CTACTGCTGA CTTCAAACGG CCGCCGAATT C 701 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2863 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Oryza sativa subsp. indica 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

GAATTCCATC TCGTGTGGCC GGAATCACAG GCACAAGGTG CTGCGTCACC TATAATGGGC 60 

CTATGGGCTT GTAACTTCAT TTGGGACCCC GGCCCAGGTG GGGCCCATGT GGGGTGCGCC 12 0 

CCAACCAGGT GGAGCACGAC CCTAGGCACC CCTAGGTCGT CCTCCCACCC CTATATATAG 18 0 
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CTAGGTACCC CTTCAGGGTT TCTTGGGTTT TGATAGATTA AAGTTTAGCC ATTGCTACTT 24 0 

GCTTGCAGCG CGCGTGTCGC CTAGACCGTC CGTCTGCTTG TTCTTCGGAA CCCCAACTTA 3 00 

TCATTTGTAT TAAATTC CTA TTTGCAATAT CAGATTGCTT TTATCTTGTT CTTGCTTGTT 360 

TCTTCGATTT GCTTGCAGGA ATAGGGTTGA TCTGCACCGG TAAGATCAAC AACCCACGGA 420 

GAGGTGTATC GATCGCTAAG GCGCAACACA ACGTCTCGTA CGGTTGTAGT CGGATCGTCA 4 BO 

ACGTTTCTCC CAAATCGTAG TTATCACAAC TCACCGAAAG ATCGGGCCAA ACAACTGCCT 54 0 

TGAGTGTCGA GAGGAACTCA GGGTTCATCA GGTGGTATCA GAGCTTTCGT TGCTCGGTGA 600 

GTTTTTATCT TCCTATAACC AGAAAATAGC CATACAAAAA AAATTCGTAT CATTTACCTA 660 

GCCATATTGT GCCATTGCAT TGTTCTTTTC AGTTTTTGCT TTGTTGAATT TTGTGTTGCA 72 0 

TCGTCACGTC GTGTTGCTGG TCTTAGCGTC TAGTTCGTTT AGAGTTTCGA GTTCTGGTCA 780 

CGTTTTGTAC CACAGACGTC GCTGCCACCG TGTCTCTGTT GTTGTCGGGA CCTACGAAAA 84 0 

CGGAATCAGC TCGCTGCCAC GGTTCTATTT TTGTAGTTTT CAGAAGTTTC TGTCGGTTAG 900 

TTTTGTAGTT CTTGAGTTGC ATCTAGAGTT TGCCGATCCT TGTGTGGGTT TGTTTTGGCC 960 

ATGCCTCTGT CGTGCACGAG AGGAGGAAGG TGAATTACCA TATCTGATTT TGGAAGTAGT 1020 

CAAGTTTGTT TTGGAAAGAT ATAGTAGATT GGATTGGTTA AAAATCAGTT TCCTTTTTAT 1080 

CCCATACACC AAATTCGGCT GCCATCCACT CCACCTCCTG GCCGAGTCCC ACTCGCCCCT 114 0 

TTGGCCGAGT CCGACTCCCT CCCTCTCTTC CACGATTCCG AGTTGTGTCA AACACCTTGG 1200 

CGAATTTTTG TTTGGTGTCG GTTTCGAGAT CTGTTTGGAA AAACGGAACC GGCATAAATT 1260 

CAGCATTCCA TTTTTTGTAT AATTTTAATT TTGGGTTTAG ACTTTTACAT TTGAGTCCCT 1320 

GTAAATTTTA TATTTATGTT TGAGTCCCTG TAATCTTACA TTAAGGTCCT TGAGTCAGTT 1380 

TTTGTTTAAA AAAATCAAGA AAAAAAAGTG AGAAAAAAAA AGGCAGAAAG GTGCAAAAAA 144 0 

AAAGAGAAAA AAAAAGCCGC ACAAAAAAAA CAGAAAAGAA AAGGAAAGAA AAAAAAAAAG 1500 

AAAAGAAGAA AGAAGAAAGA AAGAAAAGAG AAAATACTGT TATGTTTGAG CTGAACTTCA 1560 

TATATCAGAG TTGTGCATGA GTTGTTCCTA GTGCTATCTT GTGGTATCGT TTGTGTCTAG 1620 

GCTCGCGTCT CTAGTACGTT CTAGCCTAGG ACCAGCACGG TACTTGACTT TGAACAATTA 1680 

TTCAACTTTG CATTATCTGA TTTGAGCATT TGCTATTCCT TTGCTACATA TTTAAGCCTA 174 0 

CCCAGAGCTC CACATATTTG ATTACAGCCG TACCGCAAGT GTTTGCCAAG GCATCGATAC 180 0 
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ATCCAACCTT CATTGGGGTG CTTGGTTGAG TCGGTGTATG TCACCATTCC ACTTGCATTG 1860 

GTAAGATCTT GTAAGAGCTT GGTTAAAAGC TTGAGTGTGT GCGATTTTTT GACCTGCCAC 192 0 

TACCTAGTAG TTAATAGGAA CGCGCATATT TTTGTGTATG TTTCCTGTTT TCTACTAACA 1980 

ATGGCAGGGA TACGCAAGAT AATTGGGGAT AGCTGTGCTC AACATCGACA TCTTCGTCGA 2040 

GACATGAGGA GGGATCA^CA TGACCATTAT GAGGTAAGTG ATGATGTTCT AGGTAAGATC 2100 

AAATCTGCAC TGCCTTATTT CGAGGGAAAC TATGATCCTC GTGCTTACAT TAATTGGGAG 2150 

CTAGCGGTTG ATAGTGAATT TCAAAAGCAT GTCTTGTCGG AGAAACAAAA GGTTATGTGT 222 0 

GCCTCTAGTG TTTTAATTAA ACATGCTTCT AATGATTGGA AACATCTTTG TAGGCATAAC 22 80 

AAAATACCAC AATCTTGGAA AGACCTGAAA CGATATTTCA GAGATGTTTA TGTTCCCATG 2340 

TATTATGCTG ATATTCTGCT CAACAAACTG CAATGTTTAA AACAAGATAC CAAAAGTGTT 24 00 

ACTTCATACT ATCATGATAT GCATGCTTGT TTATTACGTT GTGGCTTAGA TGAATGTGAA 24 60 

GAAGCTACAG AATTGAGGTT TTTACGTGGA CTTAACAAAG AAATTCAGGA CATGCTTGCT 2 52 0 

TGTGAAAAGT ATAGATCTCT TTCTCATTTG TTACAACTTG CTTGCAATGC TGAAAGTAAA 2 580 

ATAGAGGAGG ATATGAAAAA GAAACACGCT ATGTCTTTGC CTCCAATTAC TAACTATTTG 264 0 

CAGGAAGTGC GTAATCATGA AAAGGAGGAG AGAGACATGA AAGAGCCACC AATTCCATTG 27 00 

TTCACACTCA AGTTCGAGAC ACCTCCATCA TCTAAAGAGG ACATCAAAGG TAAAGTAAAT 2760 

GGTACTGAAA TTAATCAAGG TGAGTGCATT GTTAACGAAG TAAATTTGTT CACTTTTCAT 2820 

GCAAAAGTAG AGCAACCATT AGTGGAACCA AATGCTGGAA TTC 2 863 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 745 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: ■ Sorghum bi color 

(vii) IMMEDIATE SOURCE: 

(B) CLONE; pSau3A9 

(X) PUBLICATION INFORMATION: 
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(A) AUTHORS: Jiang, Jiming 

Nasuda, Shuhei 

Dong, Fenggao 

Scherrer, Christopher W. 

Woo, Sung- Sick 

Wing, Rod A. 

Gill, Bikram S. 

Ward, David C. 

(B ) TITLE: A Conserved Repetitive DNA Element Located in 

Centromeres of Cereal Chromosomes 

(C) JOURNAL: Proc . Natl. Acad. Sci . U.S.A. 

(D) VOLUME: 93 

(F) PAGES: 14210-14213 

(G) DATE: November-1996 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GATCTTTGGA TTGGAAACAG TTAAACGGCA ATATGTGCAT GATGATGATT TTAAAGATGT 60 



GTTTTTGCAT TGTAAGGATG GGAAGGCATG GAATAAATTT GTTGTAAATG ATGGTTTTGT 12 0 

GTTTAGAGCT AATAAGCTAT GCATTCCAGC TAGCTCTGTT CGTTTGTTGT TGCTACAGGA 180 
AGCACATGGA GGTGGTTTGA TGGGACATTT TGGGGCAAAG AAGACGGAGG ACATACTGGC 240 



TGGTCATTTC TTTTGGCCAA AGATGAGGAG AGATGTGGAG AGATTTATTG CTCGCTGCAC 3 00 

GACATGTCAA AAGGCCAAGT CACGCTTAAA TCCACACGAT TTGAAGCCAT ATTTGGGTGA 360 

GGGAGATGAG CTTGAGTCGG GGACGACTCA AATGCAAGAA GGGGAGGATG ATGAGGACAT 420 

CAGCACCATC TATACATCCA CACCTACACC CACACCATCG CCAACACCAC TTGGCCCTCT 480 

TACTCGTGCC AGTGCCCGTC AACTGAACCA TCAAGTAAGT TTATTCTTAA ACTCTTGTCC 540 

ATCATATTTA GACAATGGAG ACACGTGCAC TCTTGTTTTG CTTAGGAATG ATGGAGAGGA 600 

CCAGAAGCAT AGGGGATTGG TGTAGGCTGG ATTTGGACAG CAAGACAGCA CCAACTTACA 660 



ACAACCGCCA TGACTTCATA CAGAGTCCAT TTTAAGCATG CAAGCACTTG ATGGAAAACT 72 0 

CGTCAAGTAT ATTTTTAGAT GGATC 745 
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